Spatiotemporal trends in COVID-19 vaccine sentiments on a social media platform and correlations with reported vaccine coverage

Abstract Objective To assess spatiotemporal trends in, and determinants of, the acceptance of coronavirus disease 2019 (COVID-19) vaccination globally, as expressed on the social media platform X (formerly Twitter). Methods We collected over 13 million posts on the platform regarding COVID-19 vaccination made between November 2020 and March 2022 in 90 languages. Multilingual deep learning XLM-RoBERTa models annotated all posts using an annotation framework after being fine-tuned on 8125 manually annotated, English-language posts. The annotation results were used to assess spatiotemporal trends in COVID-19 vaccine acceptance and confidence as expressed by platform users in 135 countries and territories. We identified associations between spatiotemporal trends in vaccine acceptance and country-level characteristics and public policies by using univariate and multivariate regression analysis. Findings A greater proportion of platform users in the World Health Organization’s South-East Asia, Eastern Mediterranean and Western Pacific Regions expressed vaccine acceptance than users in the rest of the world. Countries in which a greater proportion of platform users expressed vaccine acceptance had higher COVID-19 vaccine coverage rates. Trust in government was also associated with greater vaccine acceptance. Internationally, vaccine acceptance and confidence declined among platform users as: (i) vaccination eligibility was extended to adolescents; (ii) vaccine supplies became sufficient; (iii) nonpharmaceutical interventions were relaxed; and (iv) global reports on adverse events following vaccination appeared. Conclusion Social media listening could provide an effective and expeditious means of informing public health policies during pandemics, and could supplement existing public health surveillance approaches in addressing global health issues.


Introduction
Combatting a global pandemic, such as the coronavirus disease 2019 (COVID-19) pandemic, requires a multifaceted response from governments.Vaccination campaigns and nonpharmaceutical interventions, including city-wide lockdowns and travel restrictions, have a far-reaching impact on society and their effectiveness is contingent on public compliance.Consequently, policy-makers' understanding of the impact of their decisions and the way they adjust policy in response to public concerns are key components of any effective public health intervention.
Monitoring data on social media through social media listening can play a crucial role in assisting policy-makers.By using advanced machine learning techniques, a nuanced narrative that reveals social attitudes, perceptions and actions can be constructed from simple textual (and visual) information on social media.Although social media users do not accurately represent the general population, geographical and temporal trends in their attitudes can reveal how the global or local social environment is reshaping people's mindsets.In addition, social media listening enables researchers and policy-makers to scrutinize the ever-changing dynamics of the public's response to public health measures in a costeffective and expeditious way. 1,24][5][6][7][8] In contrast, little is known about attitudes to vaccines in low-and middle-income countries.
In this study, we fine-tuned multilingual deep learning models to analyse posts on X (formerly tweets) on CO-VID-19 vaccination in 90 languages that were made between late 2020 and early 2022.We assessed global geographical and temporal trends in the acceptance of COVID-19 vaccines among platform users from 135 countries and territories, and validated our findings using statistical data on COVID-19 vaccination coverage.We also explored the determinants of trends in COVID-19 vaccine acceptance among platform users.The overall intention of our investigation of COVID-19 vaccine acceptance was to demonstrate how social media listening can be employed effectively in the public health domain.
Global trends in COVID-19 vaccine acceptance Xinyu Zhou et al.
lingually, we developed an annotation framework for vaccine-related posts, which was first used by humans (i.e.not machines) to annotate a sample of posts.Then, we fine-tuned multilingual deep learning models to imitate human annotations and, finally, we annotated all posts available using the fine-tuned deep learning models.

Collection of posts on social media platform X
We used the social media platform X as our data source because it is one of the world's most popular social media platforms.Between 2020 and 2023, the number of users worldwide fluctuated around 350 million.We identified 1027 keywords relating to COVID-19 vaccination that covered 90 major languages (details are available from the online repository). 9Using these keywords, we collected 13 093 406 publicly available posts on COVID-19 vaccination made in various languages between 13 November 2020 and 5 March 2022; these were all such posts identified by the Meltwater media monitoring and social listening platform (Meltwater, San Francisco, USA).We collected original and quote posts (i.e.secondary posts containing the original post with additional comments) but excluded simple secondary posts and replies.

Annotation framework
We used the confidence, complacency and convenience (i.e.3Cs) model of vaccine hesitancy proposed by the World Health Organization (WHO) to develop an annotation framework for COVID-19 vaccine-related posts, 10 which was validated in a sample of 500 posts.Vaccine acceptance and vaccine refusal were the core measures.In addition, we investigated determinants of vaccine acceptance, such as confidence in vaccines, the online information environment and perceived barriers to accessing vaccines.Specifically, our annotation framework covered four key concepts related to COVID-19 vaccination and included eight categories (Table 1).First, COVID-19 vaccine acceptance covered the categories of: (i) intent to accept vaccination; and (ii) intent to refuse vaccination.Second, confidence in COVID-19 vaccines covered: (iii) belief that vaccines are effective; (iv) belief that vaccines are not safe; and (v) distrust in government.Third, the online information environment regarding COVID-19 vaccines covered: (vi) misinformation or rumours about vaccines.Fourth, perceived barriers to accessing COVID-19 vaccines covered: (vii) vaccine accessibility; and (viii) vaccine equity.
Using this framework, two annotators independently annotated 8125 English-language posts on COVID-19 vaccination.Any disagreement was resolved by a third annotator.There were two main steps: (i) each annotator separated human-generated posts from news reports, advertisements, government announcements and posts generated by automated (i.e.bot) accounts; and (ii) each human-generated post was annotated according to its relevance to the eight annotation framework categories.A post could be relevant to one or more categories or to none.Examples of annotated posts are available in the online repository. 9

Fine-tuning multilingual deep learning models
Multilingual deep learning models are pretrained on textual data sets containing billions of words in multiple lan-guages, and can develop a cross-lingual understanding of natural language. 11ine-tuning these models using a small, manually annotated, task-specific data set in a single language enables them to perform the same task in around 100 languages without the need for translation. 12,13o analyse COVID-19 vaccinerelated posts in 90 languages, we finetuned several multilingual deep learning models based on the recent, state-of-theart, multilingual model, XLM-RoBERTa (HuggingFace, 2023), 14 using our manually annotated, English-language data set.To do this, we randomly selected 80% of our 8125 manually annotated posts as a training set, 10% as a validation set and 10% as a retained test set (details are available from the online repository). 9The models learned how to annotate posts from the training set, and the validation set enabled us to determine hyperparameters (i.e.settings that influence how a machine learning model learns and performs).In the test set, the resulting deep learning models achieved a precision of 61.8% to 89.7% in automatically identifying Global trends in COVID-19 vaccine acceptance Xinyu Zhou et al.
human-generated posts and annotating them as relevant to the eight annotation framework categories (Table 1).When applied to the 13 093 406 publicly available posts on COVID-19 vaccination in various languages, the fine-tuned XLM-RoBERTa models identified 6 046 183 posts as not sent by humans.The models then annotated the remaining 7 047 223 human-generated posts according to their relevance to the eight annotation framework categories.

Statistical analysis
For each of the eight categories in our annotation framework, we derived the aggregate expressed opinion of each user on the platform by averaging all annotations of their posts made by the fine-tuned XLM-RoBERTa models within each specific time period.Then we evaluated the average aggregate expressed opinions of platform users over different time intervals and in different geographical locations.Time trends were evaluated using all human-generated posts.In contrast, geographical variations were evaluated using human-generated posts for which geolocation data were available from Meltwater, which uses a platform user's profile data to derive the best estimate of their geographical location.We assessed geographical variations in 135 countries and territories with adequate data (details are available in the online repository). 9e used univariate and multivariate linear regression to identify determinants of COVID-19 vaccine acceptance and coverage across 135 countries and territories.In the regression analysis, we considered vaccination-related opinions on the platform X and 20 country variables, such as governance, pandemic preparedness, level of public trust, cultural factors (e.g.individualism), level of social development and demographic characteristics.Data on country variables were obtained from external sources (details are available from the online repository). 9uman-generated, vaccine-related posts were assessed on a daily basis, and spline regression was employed to fit global temporal trends in COVID-19 vaccine acceptance, vaccine confidence, the online information environment and perceived barriers to accessing vaccines.Country-level trends were assessed on a weekly and monthly basis for countries with sufficient data available.To explore determinants of temporal trends in vaccine acceptance, we constructed a country-level, weekly, panel data set that covered data on vaccine acceptance from posts on the platform and six indicators from external sources, including each country's policies on vaccination and nonpharmaceutical interventions, and global reports of adverse events following immunization (details are available from the online repository). 9The panel data analysis employed a fixed-effects model.
All data analyses were performed using Python v. 3.7.2(Python Software Foundation, Wilmington, USA) and R v.4.2.1 (The R Foundation, Vienna, Austria).The study was approved by the Institutional Review Board of the School of Public Health, Fudan University, Shanghai, China (IRB#2022-01-0938).

Results
The XLM-RoBERTa deep learning models identified and annotated

Online information environment (vi) Misinformation or rumours about COVID-19 vaccines
The post contains negative information on vaccines, such as misinformation, rumours or references to anti-vaccine or anti-science campaigns or vaccine scandals 0.750 0.618

Perceived barriers to accessing vaccines (vii) COVID-19 vaccine accessibility
The post refers to production or supply limitations affecting the COVID-19 vaccine or the platform user's ability to access vaccine 0.682 0.732

(viii) COVID-19 vaccine equity
The post refers to (priority) vaccination groups or to equity in vaccine allocation 0.809 0.859 COVID-19: coronavirus disease 2019.a The performance of deep learning models was assessed by comparing their annotations with annotations made by humans, with human annotation as the gold standard.The F 1 -score and precision are evaluation metrics widely used in machine learning (details are available in the online repository). 9lobal trends in COVID-19 vaccine acceptance Xinyu Zhou et al. users (89.7%) made three or fewer posts on COVID-19 vaccination during the study period (details are available from the online repository). 9Of the 4 137 550 posts with geolocation data, 1 801 100 (43.5%) came from the United States; the United Kingdom accounted for 370 200 (8.9%);Canada for 250 001 (6.0%);Japan for 198 617 (4.8%); and India for 165 017 (4.0%).During the study period, the number of posts increased markedly in December 2020, when the first COVID-19 vaccine was approved, and decreased after January 2021 (details are available from the online repository). 9

Geographical variation
Fig. 2 illustrates the proportion of social media platform users from different countries and territories whose posts were relevant to the eight annotation framework categories during the study period (full details are available from the online repository). 9Acceptance of, and confidence in, COVID-19 vaccines varied considerably across WHO regions: the proportion of platform users who expressed COVID-19 vaccine acceptance throughout the study period varied from 33.2% to 78.1% across countries and territories and the proportion who expressed an intention to refuse vaccination varied from 5.9% to 24.9%.Platform users in the South-East Asia, Eastern Mediterranean and Western Pacific Regions more often expressed vaccine acceptance and confidence in vaccine effectiveness and safety than users in the African Region, the Region of the Americas or the European Region.Countries in the South-East Asia Region accounted for four of the 10 countries or territories with the highest proportion of platform users who expressed vaccine acceptance: the proportion was 78.1% (772/989) of users in Bangladesh; 68.0% (62797/92 349) of users in India; 66.0% (1147/1738) of users in Nepal; and 64.3% (20 398/31 724) of users in Indonesia.
Of the 10 countries or territories with the highest proportion of platform users who expressed an intention to refuse vaccination, four were from the Region of the Americas: the proportion was 24

Determinants of country-level variation
Univariate linear regression found that, in aggregate, users' opinions on vaccine confidence, the online information environment and perceived barriers to accessing COVID-19 vaccines were strongly associated with vaccine acceptance and refusal (full details are available from the online repository). 9he only country-level characteristics that had a significant positive association with vaccine acceptance were trust in government and internet coverage.Country-level characteristics that had a significant negative association with vaccine refusal included better governance, pandemic preparedness, trust in government and the level of social development.Multivariate regression, which controlled for other country-level characteristics, found that trust in government remained significantly associated with vaccine acceptance (Table 2 and Table 3).Furthermore, multivariate linear regression confirmed that users' expression of vaccine acceptance was significantly associated with COVID-19 vaccination coverage at the country level (Table 4; full details are available from the online repository). 9

Temporal trends
Fig. 3  The proportion of platform users who posted on COVID-19 vaccine accessibility remained largely stable until June 2021, at around a daily average of 8.5%, when it began to decline gradually.Posts on COVID-19 vaccine equity gradually increased in the second half of 2021 until the daily average proportion reached 17.4% in August 2021 and stabilized thereafter.Temporal trends at regional and country levels were similar to global trends (Fig. 4; additional details are available from the online repository). 9

Determinants of temporal trends
Table 5 presents the findings of a fixedeffects regression analysis that explored determinants of temporal trends in CO-VID-19 vaccine acceptance.The proportion of users whose posts expressed an intention to accept COVID-19 vaccination declined significantly when vaccination eligibility was extended to adolescents and vaccine supply became sufficient.In addition, the proportion of users who expressed an intention to refuse vaccination increased significantly when: (i) global reports of adverse events following immunization appeared; (ii) vaccination eligibil-

Discussion
Our study used multilingual social media listening to monitor geographical and temporal trends in opinions about COVID-19 vaccination expressed on the social media platform X.We assessed over 7 million human-generated posts from 135 countries and territories between the emergency approval of COVID-19 vaccines and the time when over half of the world's population had been vaccinated.We found a promising association between the proportion of users who expressed COVID-19 vaccine acceptance and real-world vaccination coverage worldwide.We also found that vaccine acceptance was more common among users in WHO's South-East Asia, Eastern Mediterranean and Western Pacific Regions than in the rest of the world, and that vaccine acceptance and confidence decreased as reports of adverse events following immunization emerged.These insights into geographical and temporal trends in vaccine acceptance could be valuable for devising proactive responses to potential vaccine hesitancy involving timely and targeted interventions.
Social media listening based on multilingual deep learning models can supplement the existing public health surveillance techniques used to address global health issues. 17,18This novel approach has several advantages: (i) monitoring can be conducted in real time, thereby enabling timely interventions; (ii) it is cost-effectivene and could be applied in low-resource settings, thereby improving research capacity and pandemic responses in low-and middle-income countries; 8 and (iii) it could provide real-time insights into public sentiment to inform public health interventions, especially during outbreaks and pandemics. 8Unlike traditional research methods such as surveys, social media listening can rapidly and thoroughly scan the whole dynamic information environment for digital opinions derived from public contributions and interactions, without researcher involvement. 19Moreover, as it is not affected by the reporting bias that can result from interactions with researchers, 20 social media listening can a Fragility refers to the state's incapacity to provide essential public goods and services and to cope with shocks. 16otes: The multivariate linear regression analysis included variables for 78 countries or territories: (i) for which at least 65 data points were available; (ii) that were significant at the P < 0.1 level on univariate analysis; and (iii) that were free from multicollinearity (i.e. the correlation coefficient with another variable was ≥ 0.8).Univariate linear regression results are available in the online repository. 9Notes: The multivariate linear regression analysis included variables for 78 countries or territories: (i) for which at least 65 data points were available; (ii) that were significant at the P < 0.1 level on univariate analysis; and (iii) that were free from multicollinearity (i.e. the correlation coefficient with another variable was < 0.8).If the correlation coefficient between two variables was ≥ 0.8, only one was included.Univariate linear regression results are available in the online repository. 9lobal trends in COVID-19 vaccine acceptance Xinyu Zhou et al.
be particularly useful for research on sensitive public health issues.
Nevertheless, social media listening faces its own challenges, such as: (i) the potential non-representativeness of social media data; (ii) susceptibility to short-term noise (i.e.random fluctuations in opinion); (iii) a lack of demographic information; and (iv) a reliance on manually annotated data.First, there is an inherent bias in social media data because users may not express their genuine opinions online.Also, social media users are typically skewed towards younger individuals, who may be over-represented in anti-vaccine groups.Second, social media data are subject to short-term noise because emerging news may trigger disproportionate discussions on particular topics.Third, data on social media users' demographic characteristics are generally unavailable, which limits in-depth analyses at the individual level.On the other hand, eco-logical analysis can be widely employed to identify associations at the population level, though it cannot infer causality.Fourth, social media listening based on deep learning models depends on domain-specific fine-tuning that relies heavily on the accurate incorporation of manual annotations.
Recognition of the merits and limitations of social media listening and its careful integration into public health surveillance are crucial for optimizing

Intent to reject COVID-19 vaccination
COVID-19: coronavirus disease 2019.Notes: The graphs show daily global trends, as derived using spline regression, in the proportion of users who posted on coronavirus disease 2019 vaccination between 13 November 2020 and 5 March 2022.The days on which vaccine adverse events were reported are indicated by the vertical lines.Definition of each category is presented in Table 1.
Global trends in COVID-19 vaccine acceptance Xinyu Zhou et al.Notes: The illustration shows data for countries from which a total of ≥ 2500 users posted on coronavirus disease 2019 vaccination between 13 November 2020 and 5 March 2022, with posts from at least 30 users per month.However, Egypt and Nepal were included even though the total number of users posting in these countries was < 2500 but there were at least 30 users per month in each country.In total, data from 50 countries or territories are shown, with each point in the figure representing the monthly proportion of users who posted on each topic.Definition of each category is presented in Table 1.
Global trends in COVID-19 vaccine acceptance Xinyu Zhou et al. its effectiveness.Although platform users may not be representative of the general population, the young people and anti-vaccine groups concentrated on the platform still warrant attention from policy-makers.Social media listening can also be applied to other social media platforms, such as Facebook, Reddit and Instagram, which may be used by hardto-reach population groups.In addition, social media's sensitivity to news items and short-term events provides an opportunity to understand their impact on attitudes to vaccines.Moreover, social media analysis facilitates large-scale spatiotemporal analysis, which may not be possible with traditional surveillance approaches, such as surveys.Finally, our analytical approach can be adapted to incorporate multilingual versions of few-shot classification using patternexploiting training and SetFit (sentence transformer fine-tuning), 21,22 which ensure good model performance even when manually annotated training data are scarce.
4][25][26][27] We also found that vaccine acceptance on the platform could be a key predictor of vaccination coverage in the real world.In practice, the reliability of predictions based on social media listening could be verified by demonstrating consistency with surveys and real-world vaccination coverage.However, social media listening may produce underestimates of actual vaccination coverage, which has also been observed in survey-based studies. 28,29hese underestimates may arise, in part, from compulsory vaccination policies: some vaccinated individuals with negative views about vaccination may express them on social media.In addition, the high prevalence of anti-vaccine groups on social media may skew online opinions about vaccine acceptance.
Our study highlighted the importance of trust in boosting vaccine acceptance and coverage, which is consistent with previous research suggesting that a high level of trust was associated with greater COVID-19 vaccine coverage and lower COVID-19 infection rates. 30,31rust has also been associated with compliance with public health regulations, such as mask-wearing and observance of social distancing rules. 32Consequently, building trust in government is a priority for policy-makers seeking to promote compliance with public health interventions, including vaccination.
We observed a disturbing, continuous decline in COVID-19 vaccine acceptance and confidence after March 2021, when reports of adverse events following immunization emerged worldwide.This decline was also observed in previous surveys. 33,34The decline in vaccine acceptance presents a formidable challenge for policy-makers globally who depend on vaccination campaigns to combat pandemics and reduce preventable deaths. 33,35Policymakers should proactively prepare to increase public support for vaccination in future pandemics, in addition to implementing public health surveillance.
Our study indicated that the main determinants of declining COVID-19 vaccine acceptance were: (i) the extension of vaccination eligibility to adolescents; (ii) a sufficient vaccine supply; (iii) the relaxation of nonpharmaceutical interventions; and (iv) reports of adverse events following immunization.Social media listening can provide an early indication of declining vaccine acceptance following changes in vacci-nation or nonpharmaceutical intervention policies, thereby enabling a prompt public policy response.
In summary, social media listening using machine learning can address complex public health issues across diverse settings and in many languages.We believe this is a new frontier for public health and medical surveillance that will provide policy-makers with near-real-time insights into public perceptions and views.Recognizing public fears and their origins is the first step in devising a rapid educational response.Insights from such surveillance can also help in anticipating similar fears in the future.In future pandemics, the acceptance of newly developed vaccines could be suboptimal and could decline, as occurred with COVID-19 vaccines.Consequently, key stakeholders and officials should make early preparations to ensure public support for vaccination.■

Research
Global trends in COVID-19 vaccine acceptance Xinyu Zhou et al.

Fig. 1 .
Fig. 1.Flowchart, study of global trends in COVID-19 vaccine acceptance by social media platform users, November 2020 to March 2022

Fig. 3 .
Fig. 3. Global temporal trends in COVID-19 vaccine acceptance, vaccine confidence, the online information environment and perceived barriers to accessing vaccines, as expressed by social media platform users, November 2020 to March 2022

Fig. 4 .
Fig. 4. National monthly trends in COVID-19 vaccine acceptance, vaccine confidence, the online information environment and perceived barriers to accessing vaccines, as expressed by social media platform users, November 2020 to March 2022

cumulative variation in COVID-19 vaccine acceptance, vaccine confidence, the online information environment and perceived barriers to accessing vaccines, as expressed by social media platform users, November 2020 to March 2022
Global trends in COVID-19 vaccine acceptanceXinyu Zhou et al.

Table 4 . Country-level determinants of COVID-19 vaccination coverage, on multivariate linear regression, study of global trends in COVID-19 vaccine acceptance by social media platform users, November 2020 to March 2022 Variable Association with COVID-19 vaccination coverage, regression coefficient (95% CI) Proportion of platform users expressing intent to accept COVID-19 vaccination
CI: confidence interval; COVID-19: coronavirus disease 2019.

Table 3 . Country-level determinants of social media platform users expressing COVID-19 vaccine refusal, on multivariate linear regression, study of global trends in COVID-19 vaccine acceptance by platform users, November 2020 to March 2022 Variable Association with platform users' intent to refuse COVID-19 vaccination, regression coefficient (95% CI)
CI: confidence interval; COVID-19: coronavirus disease 2019.

Table 5 . Determinants of temporal variation in social media platform users expressing COVID-19 vaccine acceptance, on panel data analysis, study of global trends in COVID-19 vaccine acceptance by social media platform users, November 2020 to March 2022
The panel data analysis, which involved 1716 country-week observations, employed a fixed-effects model and included lags on independent variables.b Vaccine availability implies that COVID-19 vaccine was available and the public was being vaccinated in the country during the week concerned.In contrast, the vaccine supply became sufficient when the COVID-19 vaccine supply was more than sufficient for the country's population.Note: The analysis included data from only the 26 countries or territories with sufficient platform users' posts on coronavirus disease 2019 (COVID-19) vaccination (i.e. total number of users > 5000, with a weekly number > 28 before the week of 20 February 2022 and > 18 thereafter). a